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Let  d - {f(-  , 0):  0 « J},  J an  interval,  be  a family  of  univariate 

probability  densities  (wrt  Lebesgue  measure)  on  an  interval  I . First,  a 

necessary  and  sufficient  condition  is  proved  for  d to  be  identifiable 

whenever  a C C ( J) , the  class  of  continuous  functions  on  J vanishing 

at  oo  . if  f is  a G-mixture  of  the  densities  in  d with  G unknown, 

G 

an  estimator  G based  on  f and  8 = {f(x,  • ):  x « I } is  provided  such 
n G 

that  G G under  certain  conditions  on  d ■ If  X,,...,X  are  iid 

n 1 n 

/v 

random  variables  from  f an  estimator  G^  is  provided  such  that 

G (X X , •)  G( • ) almost  surely  under  certain  conditions  on  d 

and  G . Furthermore,  it  is  shown  that  |f  (x)  - f (x)|  -*■  0 a.  s.  and 

G G 

- c n 

in  with  rates  like  0(n  ) ( c > 0)  under  certain  conditions  on  the 

A A 

density  estimator  f (x)  involved  in  the  definition  of  G . The  conditions 
G n 

of  various  theorems  are  verified  in  the  case  of  location  parameter  and  scale 
parameter  families  of  densities. 
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ESTIMATION  OF  A MIXING  DISTRIBUTION  FUNCTION 


r 


J.  R.  Blum  and  V.  Susarla 

1.  Introduction  and  summary.  Let  f be  a Borel  measurable  function  from 

I XJ  to  (0,3o)  such  that  J f(x,  Q)dx  = 1 for  each  G in  J where  I and 

I 

J are  intervals  contained  in  R = (-oo,oo)  and  g and  3 be  the  collections 
of  sections  of  f with  the  first  coordinate  (in  I)  and  the  second  coordinate 
(in  J)  fixed  respectively.  For  a probability  distribution  function  G on  I , 
let 

(1.1)  fn(x)  = f f(x,  0)dG( 0) , x in  I . 

G Jj 

We  provide  an  equivalent  condition  for  the  identifiability  of  3 (for 

the  definition  of  identifiability,  see  (Al))  in  Section  2.  In  Section  3,  we 

consider  the  problem  of  estimating  G in  terms  of  f and  g . To  obtain 

G 

an  estimate  G^  of  G,  we  solve  a system  of  equalities  and  inequalities 
and  then  show  that  G^  converges  weakly  to  ( — ■>)  G under  some  con- 

ditions on  g . If  G and  f are  unknown,  but  iid  random  variables 
Xj,  . . . , Xn,  . . . are  observable  (this  is  the  standard  empirical  Bayes  situation 
of  Robbins  [4]  described  in  Section  4),  then  we  construct  (in  Section  4) 
Sponsored  by: 

1)  The  United  States  Army  under  Contract  Number  DAAG29-7  5-C-0024; 
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->  G(* ) almost  surely  (a.  s. ) 


estimates  G (X,,...,X  ,•)  which 
n 1 n 

under  some  conditions  on  3 . It  is  then  immediate  that  f 0f(x,  0)dG 

J 

- J 0f(x,  0)  dG(0)  a.  s.  whenever  0f(x,  0)  t C( J)  . Furthermore,  it  is 
I 

shown  that  our  method  of  construction  of  G provides  rates  for  a.  s.  and 

n 

L convergences  of  f*  (x)  - f (x)  to  zero  for  each  x under  some 
2 G G 

n 

additional  conditions  on  3 . In  Section  5,  all  the  above  results  are 
shown  to  hold  for  location  and  scale  parameter  families  of  Lebesgue 
densities  under  rather  weak  conditions. 

The  results  of  Section  3 are  not  only  of  mathematical  interest,  but 
also  provide  an  intuitive  basis  for  the  results  of  Section  4.  In  Sections 
4 and  5,  we  take  I = J = R as  other  cases  can  be  treated  with  obvious 
modifications  of  the  method  presented  here.  Throughout,  G is  assumed 
to  be  a distribution  function  with  support  in  J ..  The  estimator  and  its 
properties  are  compared  with  three  other  estimators  for  G in  Section  6. 

In  Section  4,  we  discuss  the  application  of  the  main  result  of  this  paper 
to  empirical  Bayes  estimation  problems. 

2.  Identifiability.  For  the  distribution  function  G in  (1.1)  to  be  estimable 
in  terms  of  f and  3,  it  is  obvious  that  the  following  condition  should 
be  satisfied. 

(Al)  f (x)  = f„(x)  for  all  x in  I » H - G = 0 . 

G H 

This  condition  is  called  the  Identifiability  (of  a)  condition.  (For  example, 
see  Teicher  [7  ]. ) 
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With  CQ(I)  denoting  the  Banach  space  of  continuous  functions  on 
the  interval  J which  vanish  at  * and  normed  by 


Ml  = sup  { I g(y)i  y in  1}  , 


we  obtain 


The orem  2.  1.  Let  Q C 0Q ( J)  . Then  (Al)  holds  if  and  only  if  8 generates 
CQ(J)  in  the  supremum  norm  (2.  1). 

Proof . Let  (Al)  hold.  Let  B be  the  closed  subspace  generated  by  8 . 

If  B * C0(J),  then  there  exists  a g in  CQ(J)  - B and  a bounded  linear 
functional  $ on  CQ(J)  such  that  $ (g)  = 1 and  $(f  ) = 0 for  f in  B . 
Also,  by  the  Riesz  representation  theorem,  there  exist  non-decreasing 
non-negative  functions  and  of  bounded  variations  on  J such  that 

*(f)  = f f(y)  d(KrK2)(y)  for  f in  CQ(J)  . 

Since  $(f  ) = 0 for  f*  in  B,  it  follows  that  f f(x,  0)dK.(0)  = 
f ^ 

f f(x,  0)dK  (0)  for  all  x in  I which,  by  (Al),  implies  that  K.  - K = 

Jj,  u 1 u 

constant.  But  then,  this  implies  that  $ (g)  = f g(y)d(K  - K )(y)  = 0 

r 1 2 

which  is  a contradiction  since  $ (g)  = 1 . Hence  8 generates  C^(J)  . 

Conversely,  let  8 generate  CQ(J)  and  (1. 1)  hold  at  G and  H . 

We  show  that  G - H = 0 . By  assumption 

(2.2)  J f(x,  0)  dG(0)  = J f(x,  0)  dH(0)  for  all  x in  I . 

J J 

Since  8 generates  C^G)  in  the  supremum  norm,  (2.  2)  can  be  extended 
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(2.  3) 


1 


/ g(0)  dG( 0)  = f g(0)  dH(0)  for  all  g in  C (J)  . 

I I ° 

Since  <t>(g)  = J g(9)  dG(8)  is  a bounded  linear  functional  on  C (J) 

J ° 

whenever  G is  of  bounded  variation  on  J,  the  uniqueness  part  of  the 

Riesz  representation  theorem  and  (2.  3)  show  that  G - H = constant.  This 

completes  the  proof  of  the  theorem  since  G and  H are  distribution 

functions  on  J . 

3.  Construction  of  an  estimator  of  G in  (1. 1).  In  this  section,  we  de- 
fine an  estimator  G ((3.6))  of  G in  terms  of  f and  S . We  con- 
n G 

sider  in  detail  the  case  I = J = R only  and  point  out  the  required  changes 
if  I or  (or  and)  J is  an  (are)  interval(s).  Throughout  this  section,  the 
integration  is  over  (-oc,  °°),  and  the  limits  are  as  n -*•  oo  unless  other- 
wise stated. 

For  a fixed  partition 


(3.1) 


0 ,(=  -°°)  < en  ft(=  -n)  < 0 , < . . . 

n,  -1  n,  0 n,  1 


<0  . .(=  n)  < 0 =oo 

n,  m(n)  n,  m(n)+l 


with 


(3.2)  6 = max  {0  .-0  . I j = 1,  . . . , m(n) } - 0 

n n, j n, J-l  ’ ’ 

and  for  x in  R,  and  for  f = -1,  . . . , m(n),  let 


(3.  3) 


M (x)  = sup{f(x,  0)  I 0 . < 0 < 0 , , , } , 

n,  l n,  f — — n,  / +1 
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and 


(3.4) 


mn  /(X*  = inf{f(x>  °)  I „ < 0 c 0_ 


n,  f — - n,  t - 

Let  P = (p  , . . . , p } be  such  that 

n n,  -1  n,  m(n) 


(3.  5) 


(i) 


3 > 0 and  Vm(n!  p = 1 

n,  ( — Hn,  l 


(li|  M„,,W»  fG«x'  8 


nd 


.....  vim(n) 

(iii  ) . ; P . m (x)  < f (x) 

L ! = - 1 n,  f n,  jf  — G 


where  (it)  and  (iii)  hold  for  x in  (G  0 } . 

n,  0 n,  m(n) 1 

= ^n  ^ ^n  *s  a s°luti°n  of  (3.  5)}  . That  is  not  empty  follows 


since  one  such  solution  is  given  by  p 


' = ^0 


For  any  p in  P , define 
n n’ 


n,  f +1 

n,  t 


dG  for  l = -1,  . . . , m(n) 


(3.6)  Gn(y)  = 


0 

P 


< % -1  + Pn,  0 
t 


V. 


2.-1  Pn, 


t 


y < 

0 

0 

n, 

e 

< 

y < 

0 , 

n, 

0 — 

n,  1 

e 

< 

y < 

0 

n, 

t - 

n,  l +1 

Clearly  G is  a discrete  distribution  function  on  R . 
n 

Note.  The  solution  of  (3.  5)  is  a simple  linear  programming  problem  and 

there  are  efficient  computational  algorithms  availabe  for  the  solution  of 

such  inequalities.  (3.  5)  can  be  solved  theoretically  for  p without  the 

assumption  that  x is  in  (0  9 },  but  such  a solution  might 

n,  o, . . . , n,  m(n)  3 

be  difficult  to  obtain. 
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The  result  leading  to  G G is 

n 

Theorem  3.  1.  Let  f(x,  • ) e CQ(R)  , 


(A2)  lim  sup  | f(x,  0)  - f(x’,  0)  | =0,  and 

x'  —x  0 

(A3)  for  each  e > 0 3 6,  6'  > 0 9 | x'  - x | < 6 and 

lo*  - o|  < 6 =»  I f(x',  O')  - f(x',  0)|  < e . 

Then  / f(x,  0)dGn(Q)  - /f(x,  O)dG(G)  = fQ(x)  . 

Proof.  Without  loss  of  generality,  let  |x|  < n . By  the  choice  of  the 

partition  (3.1)  and  (3.2),  there  exists  a sequence  {0  . } such  that 

n, j(n) 

9n  j(n)  X ' We  a^SO  °*3Serve  that  f(x,  •)  « CQ(R)  and  (A2)  imply  that 
(3.  7)  for  each  e > 0 3 6,  M>0  3 |x'-x|  < 6,  | G | > M =>  f(x',  0)  < e . 

By  the  definitions  of  Pn  and  G^  given  .in  (3.4)  and  (3.6)  respectively, 

y™(n!  P . m „(e  i,  »)  < ff(6  v,  0)dG  (0) 

^*  = -1  n,  l n,  jf  n,j(n)  - J n,j(n)’  n ' 

Zm(n) 

f = -l  Pn,  l Mn,  i (Gn,  jin)*  ‘ 

Now  observe  that  0 < D (=  the  difference  between  the  extreme  sides  of 

— n 


< sup{  | f(x',  0)  - f(x',  0')|  | x-x'  | < 6 , le-0' |<6} 


+ sup{f(x' , 0)  | x'  -x  | < 6 , | 0 1 > n } 
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by  the  choice  of  the  partition  {o  ......  0 , , , } and  the  sequence 

n,  -r  ’ n,  m(n)  + 1 

{0  , } . This  last  expression  (and  hence  D ) — 0 due  to  (A3)  and 

n, J (n)  n'  ’ 

(3.  7).  Hence,  since  the  lhs  of  (3.  8)  < f (0  . ) < rhs  of  (3.  8)  due  to 

- G n, j(n)  — 

(ii)  and  (iii)  of  ( 3.  5), 


(3.9) 


f f(0  ..  ,,  0)dG  (0)  - f (0  ..  .)  - 0 

J n, j(n)  nv  Gv  n,j(n)' 


But  f (0  J — f_(x)  by  (A2)  since  0 x . For  the  same  reason, 

G n,  J(n)  G'  n, j(n)  ’ 

f f(Gn  0)dGn  - f f(x,  G)dG^( 0)  — 0 . This  completes  the  proof  in  view 
of  (3.  9). 


Corollary  3.1.  Let  9 C CQ(R),  (Al);  (A2)  and  (A3)  hold  for  each  x in 
R . Then  G^  — G . If,  in  addition,  Qf(x,  0)  e C(R),  then 
f 0f(x,  0)dGn(G)  - f 0f(x,  0)  dG(0)  . 


Proof.  By  Theorem  3.  1, 


(3.10)  J f(x,  0)  dGn(0)  -*  f f(x,  0)  dG(0)  for  each  x in  R . 

Since  5 C C^(R)  and  (Al)  holds,  /?  generates  C^(R)  in  the  supremum  norm 
(2.  1)  by  Theorem  2. 1.  Therefore,  ( 3.  10)  can  be  extended  to  Jg(0)dGn(0) 

-*■  f g(0)  dG(0)  for  each  g in  CQ(R)  which  is  equivalent  to  the  first 
result.  The  second  result  is  a consequence  of  the  first  result  since 
0f(x,  0)  « C(R)  . 

Remark  3. 1.  If  J = [a,  b]  and  I = [c,  d]  with  -x  < a,  b,  c,  and  d < x , 

then  take  0 =a<0  <...<0  ,<0  ,,  ,=b  with  6 

n, -1  n,  0 n,  m(n)  n,  m(n) +1  n 

= max{0  - 0 - 0 . ) j = 0, 1,  ... , m(n)  + l } 0 and  solve  (3.  5)  at 

n,  j n,  j n,  j-1  ’ ’ ’ 


-7- 


* 


the  nth  stage  when  x is  in  {x^,x^, 

x % , } is  dense  in  I . 
m(n)+l 


x } where 

m(n)  +1 


lxr  x2’ 


4.  Estimation  of  G when  f is  unknown.  In  this  section,  assume 
Cj 

that  the  distribution  function  G and  f are  unknown,  I = J = R and  that 

g 

X , . . . , X are  iid  random  variables  with  common  density  f . We  ex- 
in  G 

hibit  G (•  ) (=G  (X  , . . . ,X  , • ))  such  mat  G — G almost  surely 
n n i n n 

(a.  s.  ).  An  application  of  and  motivation  for  the  results  of  this  section 
is  given  in  the  lengthy  Remark  4.  2. 

A A 

Let  f (x)  (=  f (X. , . . . , X . x))  be  an  estimator  of  f (x)  such  that 
g G i n G 


(A4) 


II  f G(‘  > ' fG(‘  ^ "■  0 a'  s> 


where  ||  ||  denotes  the  sup  norm.  For  each  fixed  n,  let  P be  the 

n 

class  of  solutions  obtained  for  (3.  5)  when  f in  (ii)  and  (iii)  is  replaced 

G 

A A 

by  f - e and  f + e respectively  where  e(  = e ) is  the  smallest 

Cj  n 

positive  number  for  which  the  class  £ is  not  empty.  This  method  of 

A a 

choosing  P does  not  require  e to  be  known  in  advance.  P is  well- 
n ’ n 

defined  for  each  n,  and  the  method  involves  a linear  programming  problem. 

Whenever  the  sample  sequence  is  in  the  a.  s.  event  A guaranteed  to 

exist  by  (A4)  the  e(  = e ) corresponding  to  that  sample  sequence  at 

n 

stage  n converges  to  zero  for  the  following  reason.  Let  e = 2 

n 

...,x  , •)  - f (*)||.  Then  e -*■  0 by  assumption.  Moreover,  p is 
n G n n 

not  empty  for  large  n since  ]]  f (x,,  . . . , x , • ) - f_(-  )||  < E ' implies 

G i n G n 

I 
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n,  I +1 


'hat  Pn={Pnj_, P„,m(n)>.  »‘'h  Pn>,  / ’ dG  f°r  ' = -1’  •••>  m|n> 

- n>i 

is  a solution  belonging  to  p . Since  e(=  c ) < £."  , c — 0 . Define 

n n n 

GnM  < = Gn  (Xj Xn,  •))  by 

A A A 

(4.1)  G (y)  = G (y)  of  (3.6)  with  p „ replaced  by  p where  p = 
n n n,  f n,  i n 

{p  . . • , P } is  in  p . 

n-1  n,  m(n)  n 

With  the  above  notation,  we  obtain  the  following  two  theorems.  The 

first  theorem  is  an  analogue  of  Theorem  3.  1 for  . The  second  theorem 

provides  rates  of  convergence  for  f^  (x)  - f^d*)  0 3.  s.  an(^  0 in  L.,  . 

n 

Theorem  4.  1.  Let  (Al)  and  (A4)  and  for  each  x in  R,  the  conditions  of 
Theorem  3.  1 holds.  Then  ^ — >G  a.  s.  If,  in  addition,  0f(x,  0)  e C(R)  , 

then  f 0f(x,  0)dG  (0)  - f 0f(x,  0)  dG(0)  a.  s. 

Proof.  Let  the  sample  sequence  (x^,  . . . , x^,  . . . } be  a fixed  point  in 

A 

the  a.  s.  event  A guaranteed  to  exist  by  (A4).  We  show  that  G^Xj,  . . . , 


G(-)  . 


Unless  otherwise  stated,  let  x be  fixed.  Without  loss  of  generality, 

let  n > J x | and  let  be  as  in  the  discussion  preceding  (4.1)  and  let 

e*  = ||f  (x  . . . , x , • ) - f~(*  )ll  • As  in  the  proof  of  Theorem  3. 1,  let 
n G 1 n G 

A 

0 -*  x . By  the  construction  of  p and  G , 

n,j(n)  n n 


(4.2) 


m(nj  ^ 

V n m ( 0 ) < f f( 0 ..  0)dG  (0) 

, pn , f n,r  n,j(n);  - ■>  'n,j(n )’  n 


f G ^-1  V'VW  • 


rhe  difterence  between  the  extreme  sides  of  (4.  2)  goes  to  zero  due  to 

(A2)  and  (AS)  as  in  the  proof  of  Theorem  3.1.  Also,  by  the  construction 

preceding  (4.1),  and  the  assumption  on  ? the  lhs  of  (4.2) 

G» 

> - e_  > f,-(0  ■!  J ■ e ~ £ and  rhs  of  (4.2) 

— G n,  J ( n)  n—  G n,  j(n)  n n ' 

< ,)  f e + e for  large  n . Now  recall  that  0 < e < z — 0 . 

- G n,  j(n)  n n — n — n 

Hence  f f(0  0)dG  (0)  -*  lim  f (0  ) = f (x)  since  (A2)  implies 

n,j(nj  n o n,j(n)  G» 

that  f is  continuous  at  x and  0 -*■  x . For  the  same  reasons, 

G n, j(n)  ’ 

f f(0n  j(n),  0)dGn(O)  - / f(x,  0)dGn(0)  - 0 . Therefore, 

(4.3)  f f(x,  0)dG  (0)  - f_(x)  = f f(x,  0)dG(0)  for  all  x in  R . 

J n (j  J 

Now  the  conditions  B C CQ(R)  and  (Al)  imply  (as  in  the  proofs  of 

Theorem  3.1  and  Corollary  3.1)  that  (4.  3)  can  be  extended  to  f g(0)dG  (0) 

J n 

-*  f g(G)dG(0)  for  all  g in  C (R)  which  is  equivalent  to  G (x,,..., 

J 0 n 1 

W , f 

xn>  ■ ) > G(-  ) . Since  (x^,  . . . , x^,  . . . } is  an  arbitrary  point  in  A 

with  P( A)  = 1,  the  proof  of  the  first  part  of  the  theorem  is  complete.  The 

second  part  follows  from  the  first  part  since  0f(x,  0)  « C(R)  . 

One  advantage  of  our  method  of  construction  of  G is  that  the  rate 

n 

A 

results  of  f can  be  used  to  obtain  the  corresponding  rate  results  for 

f£  as  the  following  theorem  shows.  Recall  that  6 is  defined  by  (3.2). 
n n 

Theorem  4.  2,  Let  the  conditions  of  Theorem  4.  1 hold  and  let  (A4)  hold 

with  rate  0(o-  ) with  a l 0 . If 
n'  n 

(A5)  sup  sup  { | f(x,  0)  - f(x’,  0)|  } < y 

|x’  - x|<  6 0 n 

n 
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with  l 0 as  6^  i 0,  then  [max  {o^,  yn  } ] V I f £ (x)  - f^(x)|  = 0(1) 

n 

a.  s.  (a.  s.  set  is  independent  of  x,  but  0(1)  could  depend  on  x)  . 
Additionally,  if 


(A6)  sup{E(f  (x’)  - f (x))  | x'  - x | < 6 } = 0(p  ) , 

G G n n 

222-1  2 

then  [maxfa  , p , y }]  E[(f^  (x)  - f (x)  ] = 0(1)  . (Again,  0(1)  could 

n n n G G 

n 

depend  on  x . ) 

Note.  (A5)  is  actually  (A2)  with  a rate  of  convergence  property. 

Proof.  Let  {0  ..  ,}  be  such  that  |x  - 0 , | < 6 . By  Theorem  4.1, 

n, J (n)  n, j(n)  n 

G — — > G a.  s.  The  results  now  follow  from  the  following  set  of  in- 
n 

equalities: 


(4.4) 


fG  <*>  - fGMl  i |fG  (en,j(n)>  ' fG<0n,  j(n))l  + 
n n ’ 


+ \f£  (X)  - f-  ( 6 ,,.)l  + lf„(x)  - f (0  )l 

G G n,  j(n)  G G n,  j(n) 

n n ' ' 


< U-  (0n  „ - fr(6n  Wn.)l  + 2v 

~ ^n  n»  ^ n»  J ( n)  n 


where  the  second  inequality  follows  from  (A5).  Now  observe  that 


^ fGn(0n,  j(n)^  " fG(°n,  j(n)^  - ^ fGn(°n,  j(n)'  ^G(°n,  j(n) 


(4.  5) 


+ l?G^W-fG(V](n,^2‘n+|fG|9n,W 


- fr<9n  </rJ 
G n,  j ( n) 
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where  the  last  inequality  follows  from  the  construction  of  G (for  ex- 

n 

ample,  see  the  argument  following  (4.2)).  Now  the  first  result  follows 
from  (4.4),  (4.  5)  and  (A4)  while  the  second  result  follows  from  (4.4), 

(4.  5)  and  (A6). 

Remark  4.1.  If  I and/or  J are  finite  intervals,  then  apply  the  modi- 
fications suggested  in  Remark  3.  1. 

Remark  4.3.  Here,  we  discuss  an  application  of  Theorem  4.  1 to  the 

standard  empirical  Bayes  decision  problem  of  Robbins  [4].  In  an  empirical 

Bayes  decision  problem,  there  is  a sequence  of  iid  vectors  {(0  ,X  )} 

n’  n 

where  0 G,  an  unknown  distribution  and  given  0 = 0,  X ~ f(-  , 0) 

(e  a)  . X^  is  observable  v/hile  is  not.  The  empirical  Bayes  problem 

involves  exhibiting  {t^fX  , . . . , X ) } such  that  the  Bayes  risk  of  using 

tn  *n  deciding  about  0^  less  the  minimum  Bayes  risk  of  deciding  (using 

Xn)  about  0^  converges  to  zero,  hopefully  with  a rate.  Robbins  [4] 

named  such  rules  as  asymptotically  optimal  empirical  Bayes  rules  (a.o.e.B). 

In  this  situation,  one  can  use  Theorem  4.  1 as  follows  : Use  X,...,X 

1 n 

to  estimate  G by  G as  in  Theorem  4.  2.  Then  take  t =t  (X  . . 

n n+1  n+lv  1’  ’ 

X ) as  the  Bayes  rule  of  deciding  (using  X ,)  about  0 when  the 
n+i  n+1  n+1 

prior  distribution  is  X $ + (1-X  )G  where  $ is  the  standard  normal 

n n n 

distribution  function  and  0<  \ i 0 as  n t * . Such  a rule  {t  } cannot 

n n 

only  be  shown  to  be  a.o.  e.  B.  , but  also  componentwise  admissible  under 
fairly  general  conditions  on  3 , G and  the  loss  function  involved  in  the 
definition  of  the  Bayes  risk.  For  example  in  the  problem  of  empirical 
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Bayes  Squared  error  loss  estimation  of  0,  the  above  method  and  the 
dominated  convergence  theorem  provide  a.  o.  e.  B.  estimators  which  are 
component  admissible  (with  0 restricted  to  [a,  b])  provided  G is  in 
the  class  of  all  distributions  with  support  in  [a,  b],  -»  < a < b < oo  . 

The  compactness  of  the  support  of  G is  not  an  unrealisitic  assumption. 
If  the  prior  distribution  does  not  have  a compact  support,  the  asymptotic 
optimality  of  the  above  procedure  can  be  obtained  by  appealing  to  an  un- 
published lemma  of  Le  Cam  and  Scheffe's  theorem.  All  these  details, 
which  are  too  long,  will  appear  elsewhere.  Before  closing,  we  note  that 
one  has  to  use  both  parts  of  Theorem  4.  1 namely,  the  convergence  of 

fG  t0  fG  and  that  of  /0f(-,e)dG  to  fe f(.,6)dG  to  obtain  the 
n 

above  empirical  Bayes  results. 

To  obtain  rate  of  convergence  results  in  the  above  empirical  Bayes 

es4imation  problem,  we  make  the  following  change.  Instead  of  solving 

the  equations  as  described  in  the  paragraph  preceding  (4.  1),  solve  the 

equations  (3.  5)  with  f in  (ii)  and  (iii)  replaced  by  f r and  f + n 

G G G 

respectively  along  with  the  following  equations: 


(iv) 

y m(n) 
•G  f = -l 

p „ sup{Qf(x  9) 
n,  f 

(v) 

ym(n) 
Gf=- 1 

Pn  f inf{ef(x,  0) 

where 

/V 

h (•)  is  an  estimator  of  h (• 
G G 

1 ) = J 0f(-  , 9)dG  and  g is  the  smallest 

positive  number  for  which  the  above  five  equations  (i)  through  (v)  can  be 
solved  simultaneously.  Such  a solution,  as  in  Theorem  4.2,  will  lead  to 
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r 


simultaneous  rates  for  the  mean  square  convergences  of  f and  h^.  to 

f and  h respectively.  In  turn,  these  mean  square  convergences  re- 
G G 

suits  can  be  applied  to  obtain  rates  in  the  above  empirical  Bayes  esti- 
mation problem  along  with  componentwise  admissibility  since  the  function 
to  be  estimated  is  simply  h^,(-  )Aq(‘  ) based  on  X^,  . . . , . This  method 

of  obtaining  componentwise  admissible  procedures  has  been  used  in  a 
nonparametric  context  in  Susarla  and  Phadia  [6]. 

5.  Examples.  We  consider  two  examples,  one  involving  a location 
parameter  family  of  densities  on  I = R (-00,00)  and  the  other  involving 
a scale  parameter  family  of  densities  on  I = [0,°°)  . All  densities  are 
wrt  Lebesgue  measure  on  the  real  line  or  on  [0,»)  . 

To  consider  the  location  parameter  case,  assume  that 

(5.  1)  h is  a continuous  density  with  h(x)  -*  0 as  |x|  -*■  oo  . 

If 


(5.2) 


f(x,  0)  = h(x  - 0),  -oo  < 0,  x < so  , 


then  we  have 


Theorem  5.  1.  If  f is  defined  via  (5.  2)  and  satisfies  (Al),  then  G^  of 


(3.6)  — G . If,  in  addition, 

(5.3)  sup  ( I h ' (t) | 

(5.4) 


t e R}  < oo 


fQ(x)  = <nan)_1  IU  k((X  ' XJ)/an} 
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where  X,,...,X  are  iid  f^(f^(x)  = f h(x  - 0)  dG(O)),  k is  the  standard 
1 n G G j 

«4  — 1 a yy 

normal  density  and  a^  = n , then  G^  of  (4.  1)  ^ G a.  s.  provided 

e of  (A4)  = n C with  0 < 4c  < 1 . If  6 = 0(n  ^ ) with  v > 1,  then 
n n ' / t > 

|f£  (x)  - fG(x)|  = 0(n'°)  a.  s.  Moreover,  E[(f^  (x)  - fG(x))Z  ] = 

n n 

. -min {2c,  l-2c) 

0tn  ’ ) . 

Proof.  The  first  part  of  the  theorem  follows  from  Corollary  3.  1 upon  ob- 
serving that  (5.1)  implies  the  conditions  (A2)  and  (A3)  and  that  C C (R)  . 
For  the  second  result,  observe  that  (5.1)  and  (5.  3),  respectively, 

imply  that  f and  f*  are  bounded.  Therefore  Corollary  2.  6 with  r = 0 
G G 

of  Schuster  [5]  obtains  that 


"“■Vo1 


0 a.  s. 


where 


stands  for  the  supremum  norm  and  0 < 4c  < 1 . Thus  (A4) 


also  holds  with  e = n . Now  Theorem  4. 1 obtains  the  result  G 


The  third  part  of  the  theorem  follows  since 


sup  sup  { | f(x,  0)  - f(x',  G)  | } < 6 ||  h'  II 

I x' -x |<  6 0 n 

n 

implying  (A5)  with  y = 6 = n ^ . To  obtain  the  L„  convergence  result, 

n n 2 

we  verify  (A6)  with  (32  = n min^2c>^  as  follows; 

(5.  5)  E[(?g(x')  - fG(x'))2]  * var(fG(x'))  + (E[fQ(x')]  - fQ(x'))2  . 

By  the  definition  of  ? and  Lemma  2.  3 of  Schuster  [5],  | E[f  (x* ) ] - 

G G 

f G ( x 1 ) I < Cj  an  for  some  constant  c^,  and  since  k is  bounded  by  unity 
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and  since  X^,  ...,Xn  are  iid,  var(f^(x'))  < (na^)’  • Hence,  since 


a = n , (5.  5)  = 0(n  min  ^ 2c  ^ _ This  verifies  (A6)  since  0 < 4c  < 1 


and  so  the  last  result  follows. 


For  considering  the  scale  parameter  case,  assume  that  h is  a 


continuous  density  on  [0,oo)  with 


(i)  sup{yh(y)  | y n } — 0,  su  p { I h 1 (y)  | y > 0 } < so 
[5.  6)  ( (ii)  sup  f y I h ' (y)  | y>0}<x>  and 


(iii)  sup{y  |h'(y)  | y > 0}  < so 


(5.  7) 


f(x,  0)  = 9h(x0)  for  x,  0 > 0 , 


then  we  have  the  following  theorem  whose  proof  is  omitted  since  it  is 


similar  to  that  of  Theorem  5.  1. 


Theorem  5.2.  If  f is  defined  via  (5.7)  and  satisfies  (Al),  then  G of 
n 


(3.6) 


G . If,  in  addition, 


(5.  8) 


r 2 

J 0 dG(0)  < oc 


and  f is  defined  by  (5.4)  where  X.  are  iid  f (=  f 0h(x0)dG( 9)),  then 

1 (j  J _ 


G of  (4.  1)  — G a.  s.  provided  e of  (A5)  = n"c  with  0 < 4c  < 1 . 
n n 


If  6 = 0(n  Y)  with  y > 1 then  |f-v  (x)  - f (x)|  = 0(n  C)  a.  s.  Moreover 

n (j 

E[(f-  (x)  - fG(x))2]  = 0(n-inin^2c>1_2c^)  . 
n 


Remark  5.  1.  Theorem  5.  1 includes  the  family  of  normal  densities  indexed 


by  the  mean  and  with  known  variance  while  Theorem  5.2  includes  the  family 
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of  scale  parameter  exponential  distributions  with  the  second  moment  of 


the  mixing  distribution  finite. 

Remark  5.  2.  The  results  of  this  paper  can  be  extended  when  both  the 
arguments  x and  0 are  vectors  and  can  be  applied  to  mixtures  of  dis- 
crete probability  distributions  with  appropriate  changes.  It  is  well- 
known  that  the  family  of  binomial  distributions  { B( n,  p)  | 0 < p < 1}  is 
not  identifiable.  That  this  is  the  case  can  be  readily  seen  from  Theorem 
2.  1 since  the  class  of  polynomials  of  degree  at  most  n does  not  generate 

c0[o,i] . 

6.  Some  other  estimators  and  comparison  with  our  estimator.  We  briefly 

describe  three  methods  of  estimation  of  G and  compare  their  results  with 

those  presented  here.  In  the  method  by  Deely  and  Kruse  [2],  the  finite 

interval  A (on  which  G is  assumed  to  have  support)  is  partioned  by  the 

points  X,  .....  X so  that  there  is  a sequence  {>  } of  classes  of 

In’  ’ nn  n 

distributions  such  that  the  support  of  each  distribution  in  is  in 

n 

(X X } and  for  every  G with  support  in  A,  there  exists  a 

In’  ’ nn 

sequence  {G  } with  G in  Jt  and  G — G . Then  their  method 
n n n n 

chooses  a G in  & which  minimizes  the  sup  distance  ||  F - FTT|| 
n n n H 

where  H is  in  j, nFn(-)=  Zj!=l  ![x  < • ] and  FH(' ) = / F(' > e)  dH 
where  F(-  ,0)  is  a distribution  function  for  each  0 . They  point  out  that 
their  method  involves  finding  an  optimal  strategy  in  a game  with  a payoff 
matrix  which  depends  on  F , and  X,  .....  X . They  point  out  that  A 
can  be  taken  to  be  R . Choi  [1]  uses  the  Wolfowitz  distance  function 
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i 


k 


d(G,  G)  = J (G(x)  - G(x))^  dG(x)  and  in  the  words  of  Deely  and  Kruse  [2], 
the  computational  feasibility  of  Choi's  method  is  not  clearly  established. 
Moreover,  Choi's  [1]  method  needs  the  solution  of  a dynamic  programming 
problem,  and  considers  only  finite  mixtures.  Meeden  [ 3]  constructs  a 
probability  distribution  on  the  class  of  all  probability  distributions 
on  [0,  oo)  and  then  show  that  the  Bayes  estimate  based  on  the  first  n 
observations  corresponding  to  the  constructed  prior  converges  — to 
the  true  element  GQ  in  . Again  the  solution  of  finding  estimates  by 
Meeden' s [3]  method  appears  as  hard  as  we  have  in  the  paper.  Our 
estimators  have  the  simplicity  that  they  need  only  a linear  programming 
computation  (see  the  note  following  (3.6)),  have  some  distance  properties 
(Theorem  4.2), and  will  give  componentwise  admissible  empirical  Bayes 
estimators  with  and  without  rates  with  a small  amount  of  extra  work  if 
the  support  of  the  prior  is  in  a compact  set.  It  is  not  clear  how  one  can 

A /V 

recover  rate  results  for  the  density  f and  h from  the  weak  convergence 

G G 

results  of  the  above  three  authors. 
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